Nosana: Using OpenAI GPT-OSS:20B with Ollama

Welcome to this tutorial on running the latest OpenAI open-weight model, gpt-oss:20b, hosted on Nosana - a decentralized GPU network that allows for distributed model inference. This setup is fantastic for when you need more computational power than your local machine can offer, without the hassle of managing your own high-end hardware.

Want to learn how to actually spin up a job on Nosana and retrieve the base_url? Check out these resources:

We'll cover:

  • Setting up your environment.
  • Connecting to a remote Nosana Base URL.
  • Pulling and interacting with the gpt-oss:20b model.
  • Basic text generation, streaming, and chat.
  • A simple example of function calling.

1. Setup and Installation

First, we need to install the necessary Python libraries. We'll use ollama to communicate with the Ollama server and python-dotenv to manage our environment variables securely.

%pip install ollama python-dotenv

Environment Variables

To connect to our remote server, we need to tell the Ollama client its address. We'll store this in a .env file to keep our configuration clean and separate from our code.

Create a file named .env in the same directory as this notebook and add your remote server URL to it. If you're using Nosana, this will be your unique NOSANA_BASE_URL.

Your .env file should look like this:

OLLAMA_HOST=your_nosana_base_url_here

2. Loading Configuration and Connecting

Now, let's load the environment variable from our .env file. The ollama-python library is smart and will automatically use the OLLAMA_HOST variable we've set. For clarity, we will also show how to create a client and explicitly pass the host.

import os
from dotenv import load_dotenv
import ollama
from IPython.display import display, Markdown

# Load environment variables from .env file
load_dotenv()

# Get the remote server URL from environment variables
ollama_host = os.getenv("NOSANA_BASE_URL")

def short_link(link):
    if link and len(link) > 10:
        return link[:20] + '...' + link[-20:]
    return link

if not ollama_host:
    print("OLLAMA_HOST environment variable not found!")
    print("Please create a .env file and add your remote server URL.")
else:
    print(f"Connecting to remote Ollama server at: {short_link(ollama_host)}")

# You can create a client explicitly, which is good practice
client = ollama.Client(host=ollama_host)
Connecting to remote Ollama server at: https://4w9w89qshprb...node.k8s.prd.nos.ci/

3. Interacting with the Model

With our connection established, we can start interacting with the gpt-oss:20b model. If the model isn't already available on the remote server, Ollama will download it automatically on the first run. You can also explicitly pull it.

model_name = 'gpt-oss:20b'

try:
    display(Markdown(f"Pulling the '{model_name}' model. This may take a while..."))
    client.pull(model_name)
    display(Markdown("Model pulled successfully!"))
except Exception as e:
    display(Markdown(f"Error: {e}"))

Pulling the 'gpt-oss:20b' model. This may take a while...

Model pulled successfully!

Basic Generation

Let's start with a simple text generation request.

response = client.generate(
    model=model_name,
    prompt='Explain the concept of a Large Language Model in one sentence.'
)

print(response['response'])
A Large Language Model is an AI system trained on vast text data that learns statistical patterns of language so it can generate, translate, or understand text in a way that mimics human style.

Streaming Responses

For more interactive applications, you can stream the response as it's being generated. This is great for showing a real-time typing effect.

stream = client.generate(
    model=model_name,
    prompt='Write a short story about a robot who discovers music in 50 words.',
    stream=True
)

for chunk in stream:
    print(chunk['response'], end='', flush=True)
Steel heart, dormant in the workshop, scanned old vinyl. A crackling needle whispered rhythm. The robot’s circuits sparked, translating harmonies into code. With each chord, gears wavered, emotions blooming. It recorded melodies, breathing life into metal. Music, the universe’s pulse, filled his void, and he sang for eternal resonance always.

Chat Interface

The chat method is designed for conversational interactions, where the model remembers the context of the conversation.

messages = [
    {
        'role': 'user',
        'content': 'What is the most important programming language for AI development? Explain in 50 words.'
    }
]

chat_response = client.chat(model=model_name, messages=messages)
display(Markdown(chat_response['message']['content']))

Python remains the cornerstone of AI development, offering extensive libraries (TensorFlow, PyTorch, Scikit‑learn), a clear syntax, and a massive community. Its rapid prototyping, readability, and cross‑platform compatibility make researchers and engineers quickly build, test, and deploy models, keeping AI accessible to all developers in industry and academia alike and beyond.

4. Advanced: Function Calling

The gpt-oss models have strong capabilities for function calling (or tool use). This allows the model to request the invocation of a function you've defined in your code to get external information or perform an action.

Here's a simple example where we define a tool to get the weather.

import json
import requests

def get_current_weather(city: str):
    """Get the current weather in a given city using Open-Meteo API"""
    # Geocoding to get latitude and longitude
    geo_url = f"https://geocoding-api.open-meteo.com/v1/search?name={city}&count=1"
    geo_resp = requests.get(geo_url)
    geo_data = geo_resp.json()
    if not geo_data.get("results"):
        return json.dumps({"city": city, "temperature": "unknown", "unit": "celsius"})
    lat = geo_data["results"][0]["latitude"]
    lon = geo_data["results"][0]["longitude"]
    # Get current weather
    weather_url = f"https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}¤t_weather=true"
    weather_resp = requests.get(weather_url)
    weather_data = weather_resp.json()
    temp = weather_data.get("current_weather", {}).get("temperature")
    if temp is None:
        return json.dumps({"city": city, "temperature": "unknown", "unit": "celsius"})
    return json.dumps({"city": city, "temperature": temp, "unit": "celsius"})

tools = [
    {
        'type': 'function',
        'function': {
            'name': 'get_current_weather',
            'description': 'Get the current weather in a given city',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {
                        'type': 'string',
                        'description': 'The city, e.g., San Francisco',
                    },
                },
                'required': ['city'],
            },
        },
    },
]

messages = [{'role': 'user', 'content': 'What is the weather like in Singapore?'}]

# First, let the model decide which tool to call
response = client.chat(
    model=model_name,
    messages=messages,
    tools=tools,
)

messages.append(response['message'])

# Then, execute the tool and send the result back to the model
if response['message'].get('tool_calls'):
    tool_call = response['message']['tool_calls'][0]
    function_name = tool_call['function']['name']
    function_args = tool_call['function']['arguments']  # Already a dict
    
    # Call the function
    function_response = get_current_weather(city=function_args.get('city'))
    
    messages.append(
        {
            'role': 'tool',
            'content': function_response,
        }
    )
    
    # Get the final response from the model
    final_response = client.chat(model=model_name, messages=messages)
    print(final_response['message']['content'])
Singapore’s weather is typically hot and humid year‐round. Right now the temperature is about **27 °C** (≈80 °F). It’s in the range of 25–31 °C most days, with high humidity and a chance of brief showers, especially during the monsoon seasons.

Conclusion

Congratulations! 🎉 You've successfully connected to a remote Ollama server, interacted with the gpt-oss:20b model, and even explored its function-calling capabilities.

This remote setup unlocks the ability to work with powerful models from anywhere, without needing a supercomputer at your desk. From here, you can build complex applications, experiment with different models, or fine-tune models for your specific needs.


Ready to explore more models and power your AI apps?

👉 Visit Nosana.com to discover more models and supercharge your AI projects!